Data compression and decompression

ABSTRACT

A computer-implemented method for compressing an n-bit data value, the method comprising dividing the n bits of the data value into a first subset of bits and a second subset of bits, the first subset comprising the n−2 most significant bits of the data value and the second subset comprising the two least significant bits of the data value; performing compression of the first subset using a first compression module; and performing compression of the second subset using a second compression module, the first and second compression modules implementing different compression schemes.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. 119 from United Kingdompatent application Nos. 2204487.9, 2204486.1 and 2204484.6 all filed on29 Mar. 2022, which are herein incorporated by reference in theirentirety.

TECHNICAL FIELD

The present disclosure relates to data compression and decompression.

BACKGROUND

Data compression, both lossless and lossy, is desirable in manyapplications in which data is to be stored in, and/or read from memory.By compressing data before storage of the data in memory, the amount ofdata transferred to the memory may be reduced. An example of data forwhich data compression is particularly useful is image data. The term‘image data’ is used herein to refer to two-dimensional data that hasvalues corresponding to respective pixel or sample locations of animage. For example, the image may be produced as part of a renderingprocess on a Graphics Processing Unit (GPU). Image data may include, butis not limited to, depth data to be stored in a depth buffer, pixel data(e.g. colour data) to be stored in a frame buffer, texture data to bestored in a texture buffer, surface normal data to be stored in asurface normal buffer and lighting data to be stored in a lightingbuffer. These buffers may be any suitable type of memory, such as cachememory, separate memory subsystems, memory areas in a shared memorysystem or some combination thereof.

A GPU may be used to process data in order to generate image data. Forexample, a GPU may determine pixel values (e.g. colour values) of animage to be stored in a frame buffer which may be output to a display.GPUs usually have highly parallelised structures for processing largeblocks of data in parallel. There is significant commercial pressure tomake GPUs (especially those intended to be implemented onmobile/embedded devices) operate with reduced latency, reduced powerconsumption and with a reduced physical size, e.g. a reduced siliconarea. Competing against these aims is a desire to use higher qualityrendering algorithms to produce higher quality images. Reducing thememory bandwidth (i.e. reducing the amount of data transferred betweenthe GPU and a memory can significantly reduce the latency and the powerconsumption of the system, which is why compressing the data beforetransferring the data can be particularly useful. The same is true, to alesser extent, when considering data being moved around within the GPUitself. Furthermore, the same issues may be relevant for otherprocessing units, e.g. central processing units (CPUs), as well as GPUs.

It is therefore desirable to compress image data to be stored in a framebuffer while maintaining high quality images after decompression. It isdesirable to compress and decompress image data so that the decompressedimage data accurately reflects the image data as it was pre-compression.Different formats may be used for image data. For example the image datacan be single-channel or multi-channel image data. In a common example,pixel values may be represented with values in four channels, e.g. red,green, blue and alpha channels, and in a common example, the data isstored using 8 bits per channel (8 bpc) such that 32 bits are used torepresent the data for a pixel. Compression and decompression units maybe configured (e.g. in hardware such as fixed function circuitry)specifically to compress and decompress 8-bit data values.

However, not all data is represented as 8 bpc. Another (slightly less)common type of image data is 10 bpc (i.e. 10 bits per channel) data. Inother words, in these 10 bpc examples, for a single channel of eachpixel, the data value comprises 10 bits of information. Each pixel mayhave data in one or more channels. For example, each pixel may have datain three channels, where the channels may represent red, green and bluevalues for the pixel or may represent Y, U and V values for the pixel togive two examples. Compression and decompression units which areconfigured specifically to compress and decompress 8-bit data valueswould generally not be suitable for compressing and decompressing 10-bitdata values. However, it would add a lot of silicon area (e.g. in a GPU)to have compression and decompression units configured in hardware tocompress and decompress 10-bit data values in addition to thecompression and decompression units configured to compress anddecompress 8-bit data values.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

According to a first embodiment there is provided a computer-implementedmethod for compressing an n-bit data value, the method comprisingdividing the n bits of the data value into a first subset of bits and asecond subset of bits, the first subset comprising the n−2 mostsignificant bits of the data value and the second subset comprising thetwo least significant bits of the data value; performing compression ofthe first subset using a first compression module; and performingcompression of the second subset using a second compression module, thefirst and second compression modules implementing different compressionschemes.

The method may require that (n−2)=2^(x), wherein x is an integer.

The method may require that n=6 or n=10 or n=18.

The first compression module may compress the first subset by 50%.

The compression of the first subset may be independent of thecompression of the second subset.

The data value may represent image data.

The method may further comprise storing in memory the result of thecompression of the first subset and the result of the compression of thesecond subset.

The second compression module may compress the second subset by at least62.5% and/or by no more than 66.7%.

For a group of four n-bit data values, compression of the second subsetsof bits across the group of four n-bit data values may comprise storingfour bits comprising a second least significant bit of each of the datavalues in the group; and one bit indicative of a least significant bitfor the group of four data values.

The one bit indicative of a least significant bit for the group of fourdata values may be generated using a Boolean expression of the leastsignificant bits of the four data values in the group.

The second compression module may compress the second subset by 50%.

Compression of the second subset may comprise, for each data value,storing the second least significant bit.

For a group of m n-bit data values comprising m second subsets of bits,compression of the second subsets of bits may comprise mapping thesecond subsets of bits collectively onto an m-bit encoding, the m-bitencoding being selected from 2^(m) m-bit encodings, the 2^(m) m-bitencodings comprising a first group of encodings comprising (2^(m)−4)m-bit encodings and a second group of encodings comprising four m-bitencodings, wherein if the selected encoding is an encoding from thefirst group of encodings then the selected encoding represents a groupof m second subsets of bits in which the second least significant bit ofeach second subset is the same as a respective bit of the m-bitencoding, and wherein if the selected encoding is an encoding from thesecond group of encodings then the selected encoding represents a groupof m second subsets of bits in which all of the second subsets of bitsin the group are equal.

The method may require that m=4.

The selected encoding may represent the m second subsets of bits with nogreater error than any of the other 2^(m) m-bit encodings wouldrepresent the m second subsets of bits.

According to a second embodiment, there is provided a compression unitconfigured to compress an n-bit data value, the compression unitcomprising dividing logic configured to divide the n bits of the datavalue into a first subset of bits and a second subset of bits, the firstsubset comprising the n−2 most significant bits of the data value andthe second subset comprising the two least significant bits of the datavalue; a first compression module configured to implement a firstcompression scheme to compress the first subset; and a secondcompression module configured to implement a second compression schemeto compress the second subset, wherein the first and second compressionschemes are different.

The compression unit may require that (n−2)=2^(x), wherein x is aninteger.

The second compression module may be configured to, for a group of fourn-bit data values, compress the second subsets of bits across the groupof four n-bit data values by determining four bits comprising a secondleast significant bit of each of the data values in the group; and onebit indicative of a least significant bit for the group of four datavalues.

The second compression module may compress the second subset by 50%.

The second compression module may be configured to, for a group of mn-bit data values comprising m second subsets of bits, compress of thesecond subsets of bits by mapping the second subsets of bitscollectively onto an m-bit encoding, wherein the second compressionmodule is configured to select the m-bit encoding from 2^(m) m-bitencodings, the 2^(m) m-bit encodings comprising a first group ofencodings comprising (2^(m)−4) m-bit encodings and a second group ofencodings comprising four m-bit encodings, wherein the secondcompression module is configured such that if the selected encoding isan encoding from the first group of encodings then the selected encodingrepresents a group of m second subsets of bits in which the second leastsignificant bit of each second subset is the same as a respective bit ofthe m-bit encoding, and wherein the second compression module isconfigured such that if the selected encoding is an encoding from thesecond group of encodings then the selected encoding represents a groupof m second subsets of bits in which all of the second subsets of bitsin the group are equal.

The compression unit may require that m=4.

The compression unit may be embodied in hardware on an integratedcircuit.

There is also provided an integrated circuit definition dataset that,when processed in an integrated circuit manufacturing system, configuresthe integrated circuit manufacturing system to manufacture thecompression unit.

There is further provided a compression unit configured to perform thecomputer-implemented method of the first embodiment.

There is provided a computer readable code configured to cause themethod of the first embodiment to be performed when the code is run.

The compression and/or decompression units may be embodied in hardwareon an integrated circuit. There may be provided a method ofmanufacturing, at an integrated circuit manufacturing system, acompression unit and/or a decompression unit. There may be provided anintegrated circuit definition dataset that, when processed in anintegrated circuit manufacturing system, configures the system tomanufacture a compression unit and/or a decompression unit. There may beprovided a non-transitory computer readable storage medium having storedthereon a computer readable description of a compression unit and/or adecompression unit that, when processed in an integrated circuitmanufacturing system, causes the integrated circuit manufacturing systemto manufacture an integrated circuit embodying a compression unit and/ora decompression unit.

There may be provided an integrated circuit manufacturing systemcomprising: a non-transitory computer readable storage medium havingstored thereon a computer readable description of the compression unitand/or the decompression unit; a layout processing system configured toprocess the computer readable description so as to generate a circuitlayout description of an integrated circuit embodying the compressionunit and/or the decompression unit; and an integrated circuit generationsystem configured to manufacture the compression unit and/or thedecompression unit according to the circuit layout description.

There may be provided computer program code for performing any of themethods described herein. There may be provided non-transitory computerreadable storage medium having stored thereon computer readableinstructions that, when executed at a computer system, cause thecomputer system to perform any of the methods described herein.

The above features may be combined as appropriate, as would be apparentto a skilled person, and may be combined with any of the aspects of theexamples described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

Examples will now be described in detail with reference to theaccompanying drawings in which:

FIG. 1A shows the compression of a 10-bit data value.

FIG. 1B shows the decompression of a compressed 10-bit data value.

FIG. 2 shows data in the PACK16 format.

FIG. 3A shows the compression of data in the PACK16 format.

FIG. 3B shows the decompression of compressed data in the PACK16 format.

FIG. 4 shows data in the PACK10 format.

FIG. 5A shows the compression of data in the PACK10 format.

FIG. 5B shows the decompression of compressed data in the PACK10 format.

FIG. 6 shows a graphics processing unit comprising a compression unit, adecompression unit and a memory.

FIG. 7 shows a computer system in which a graphics processing unitcomprising a compression unit and a decompression unit is implemented;and

FIG. 8 shows an integrated circuit manufacturing system for generatingan integrated circuit embodying a compression unit and/or adecompression unit.

The accompanying drawings illustrate various examples. The skilledperson will appreciate that the illustrated element boundaries (e.g.,boxes, groups of boxes, or other shapes) in the drawings represent oneexample of the boundaries. It may be that in some examples, one elementmay be designed as multiple elements or that multiple elements may bedesigned as one element. Common reference numerals are used throughoutthe figures, where appropriate, to indicate similar features.

DETAILED DESCRIPTION

The following description is presented by way of example to enable aperson skilled in the art to make and use the invention. The presentinvention is not limited to the embodiments described herein and variousmodifications to the disclosed embodiments will be apparent to thoseskilled in the art.

Embodiments will now be described by way of example only.

In examples described herein, an n-bit data value is compressed bydividing the n bits of the data value into a first subset of bits and asecond subset of bits, the first subset comprising the n−2 mostsignificant bits of the data value and the second subset comprising thetwo least significant bits of the data value. The two least significantbits of the data value are the least significant bit and the secondleast significant bit of the data value. The second least significantbit of the n-bit data value is the most significant bit of the secondsubset of bits. Then compression of the first subset is performed usinga first compression module, and compression of the second subset isperformed using a second compression module, where the first and secondcompression modules implement different compression schemes. In commonexamples, n is even and thus (n−2) is even. In further examples,(n−2)=2^(x), wherein x is an integer. To give some specific examples, nmay be 6, 10 or 18, and in most of the examples described in detailherein n=10. This means that the first compression module can beconfigured specifically for compressing (n−2)-bit data values, where(n−2) is a power of 2. It is likely that the first compression modulewill be useful for compressing other data values. For example, wheren=10, a compression module for compressing 8-bit data values is likelyto be present in a GPU (for other purposes), and this module can be usedas the first compression module. Then all that is needed, in addition tothis first compression module, in order to provide the functionality forcompressing 10-bit data values, is the second compression module whichis configured for compressing 2-bit data values. A compression modulewhich is configured for compressing 2-bit data values will tend to besmaller (in terms of silicon area) than a compression module which isconfigured for compressing 10-bit data values.

Furthermore, because the two bits of the data values that the secondcompression module is configured to compress are the two leastsignificant bits (“LSBs”) of the data values (i.e. there are moresignificant bits in the data values than the two bits which areprocessed by the second compression module), the second compressionmodule can be adapted specifically for this purpose. For example, whenLSBs are compressed, the ability to be able to perfectly representconstant regions of values (e.g. patches of pixel values which all havethe same value) is perceptually more important to the quality of thecompressed and decompressed images than the ability to perfectlyrepresent noisy regions of values (e.g. patches of pixel values whichhave different values). So the compression scheme implemented by thesecond compression module in examples described herein allows forconstant regions to be perfectly represented at the cost of sometimesintroducing greater errors for noisy regions. Overall, this provides aperceptually higher quality compression and decompression process. Thisis because small errors (e.g. errors in the LSBs of data values) inregions of an image that are supposed to be constant are more noticeableto human visual systems than errors of the same magnitude in regions ofan image that are supposed to be noisy.

In a system for compressing and decompressing 10-bit data values, highimage quality can be maintained by achieving the following aims duringcompression and decompression. In particular, it would be desirable fora compression and decompression scheme to be able to:

-   -   1. Support the full range of 10-bit values (i.e. values from 0        to 1023 inclusive)    -   2. Preserve as much of the 9^(th) bit information as possible.        The 10^(th) bit is generally noisier than the 9^(th) bit and        therefore contains less structural information. It is therefore        less important to preserve the information stored in the 10^(th)        bit than that stored in the 9^(th) bit.    -   3. Minimise the maximal (and/or mean) square error introduced by        the compression and decompression.    -   4. Perfectly represent 10-bit data values which are constant        over regions at a chosen granularity because errors in constant        regions are more noticeable than errors in non-constant regions.

It is also desirable that the compression and decompression schemes aresufficiently simple to be performed on-the-fly by GPUs operating withminimal power consumption and physical size.

There are some simple schemes for compressing the 9^(th) and 10^(th)bits which are improved upon by examples described herein. A firstsimple scheme is referred to as a “No bit replication” scheme. In thisscheme, neither the 9^(th) nor the 10^(th) bits are stored duringcompression. Upon decompression, both the 9^(th) and 10^(th) bits arereplaced with zeroes. This scheme is very simple to implement but it hasa maximal square error of 36. The maximal square error is accurateassuming the compression of the first compression module itselfintroduces no error. The “No bit replication” scheme clearly minimisesthe quantity of data which is stored in the frame buffer duringcompression for the 9^(th) and 10^(th) bits, but does not achieve any ofaims 1 to 4 given above and thus does not achieve high image qualityafter decompression.

A second simple scheme is referred to as a “Bit replication 8 to 10bits” scheme. In this scheme, neither the 9^(th) nor the 10^(th) bitsare stored during compression. During decompression, the 9^(th) and10^(th) bits are replaced with the two most significant bits of the10-bit data value (the 1^(st) and 2^(nd) bits). In other words, the twoMSBs are appended as the two LSBs. This scheme is simple to implementand enables all values between 0 and 1023 to be representable by thedecompressed data and therefore achieves aim 1. However, this schemealso has a maximal square error of 36. The maximal square error isaccurate assuming the compression of the first compression module itselfintroduces no error. The “Bit replication 8 to 10 bits” scheme thus alsominimises the quantity of data which is stored in the frame bufferduring compression for the 9^(th) and 10^(th) bits, but does not achieveaims 2 to 4 given above and thus does not achieve high image qualityafter decompression.

The examples described below provide compression and complementarydecompression schemes for compressing and decompressing the 9^(th) and10^(th) bits of 10-bit data values which are higher quality than the twosimple schemes mentioned above, i.e. which achieve more of the aimslisted above. Although the following examples are described withreference to compressing and decompressing 10-bit data values, it is tobe understood that the same principles could be applied for compressingand decompressing n-bit data values, where n is not necessarily 10. Theschemes described herein are particularly useful when the system alreadyhas compression and decompression units configured to compress anddecompress (n−2)-bit data values, such that those compression anddecompression units can be used to compress and decompress the (n−2)most significant bits of the data values and a new compression unit anda new decompression unit can be used to compress and decompress the twoleast significant bits of the data values.

FIG. 1A illustrates a scheme for the compression of a data valuecomprising 10 bits implemented in a compression unit. The compressionunit comprises a first compression module 104, a second compressionmodule 105 and dividing logic 115. In the following examples, the datais image data but in other examples may be other types of data. The datato be compressed is 10 bpc (bit per channel) image data. In other words,each pixel in the image represented by the image data comprises 10 bitsper channel. FIG. 1A shows data for a single channel such that datavalue 101 comprises 10 bits.

The data value 101 to be compressed is split (i.e. divided) by thedividing logic 115 into a first subset of bits 102 and a second subsetof bits 103. In this example, the first subset 102 comprises 8 bits andthe second subset 103 comprises 2 bits. The first subset comprises the 8most significant bits (MSBs) of the 10-bit data value. The second subset103 comprises the 2 least significant bits (LSBs) of the 10-bit value.The second least significant bit of the 10-bit data value 101 is themost significant bit of the second subset of bits 103. In otherexamples, the data to be compressed has another number of bits perchannel. The number of bits in the data value to be compressed may beanother number. According to other examples, as described above, thedata value may comprise n bits. In this example, the first subset ofbits 102 comprises a number of the most significant bits (MSBs) of the nbits. The second subset of bits 103 comprises a number of the leastsignificant bits (LSBs) of the n bits of the data value. In all thefollowing examples, the second subset of bits comprises the two leastsignificant bits. The first subset of bits therefore comprises the n−2most significant bits. In common examples, n is even and thus (n−2) iseven The first subset of bits may comprise 2^(x) bits, where x is anyinteger. In other words, (n−2) may be equal to 2^(x), where x is anyinteger. n may be equal to 6 or 18. In other words, the data value maycomprise 6 bits or 18 bits.

As shown in FIG. 1A, the compression scheme is split into compression ofthe MSBs (the major compression scheme) and separately, compression ofthe LSBs (the minor compression scheme). Compression of the MSBs isperformed by the first compression module 104. Compression of the LSBsis performed by the second compression module 105. In the example seenin FIG. 1A, compression of the MSBs performed by the first compressionmodule 104 is independent of compression of the LSBs performed by thesecond compression module 105. Compression of the MSBs and compressionof the LSBs may therefore happen concurrently. FIG. 1A shows that thefirst subset of bits 102 comprising the 8 MSBs of the 10-bit data valueis input into the first compression module 104. The second subset ofbits 103 comprising the 2 LSBs of the 10-bit data is input into thesecond compression module 105. As shown in FIG. 1A, one or more of thefirst subset of bits 102 (e.g. some number of the MSBs) may also beinput into the second compression module 105. According to certainexamples, the scheme may involve the second compression module 105inspecting one or more of the MSBs of the first subset of bits 102. Suchinspection does not affect the first and second compression modules 104,105 from being able to operate concurrently.

Compression of the first subset of bits by the first compression module104 results in a first compressed subset of bits 106. Compression of thesecond subset of bits by the second compression module 105 results in asecond compressed subset of bits 107. Following independent compressionof the first subset of bits and the second subset of bits, the first andsecond compressed subsets of bits are packed together and stored inmemory as compressed data 108.

In examples described herein, the compression scheme is designed toachieve 50% compression of the data. Compression of the first subset ofbits comprising 8 MSBs in this example can be performed using anyexisting technique designed to achieve at most 50% compression of 8 bpcdata. In other words, the first compression module may utilise anyexisting compression scheme as the major compression scheme to compressthe first subset of bits by at most 50%. The compression rate is definedas the size of the compressed data as a percentage of the size of theuncompressed data, so if data is compressed by “at most” 50% this meansthat its compressed size is 50%, or less than 50%, of the size of itsuncompressed size.

The focus of this application is the separate compression of the 2 LSBs.An advantage of performing compression of the 2 LSBs independently ofcompression of the MSBs is that the minor compression scheme can beappended to existing major compression schemes. New schemes forcompression of MSBs need not be developed. In other words, the minorcompression scheme shown in FIG. 1A can be thought of as an add-on orextension to existing compression schemes for 8 bpc data. For example,an existing compression scheme for 8 bpc data need not be modified toaccount for 10 bpc data, and can be used in the first compression module104 to compress the 8 MSBs of the data values, with the 2 LSBs of thedata values being compressed separately by the second compression module105.

As explained above, in other examples, the data may not be 10 bpc data,for example, the data may be n bpc data. In such an example, the n bpcdata would be split into the (n−2) MSBs and the 2 LSBs. An existingmajor compression scheme which can achieve at most 50% compression couldbe used for compression of the (n−2) MSBs.

Data can be compressed in discrete batches, which may be referred to astiles. In the following examples, the data is compressed in tiles formedof 64 pixels, wherein there is a data value for each of the pixels. Inother words, the granularity which can be used to read and write thecompressed data is 64 pixels. A quad is defined as four adjacent pixelsarranged in a 2×2 arrangement. Each tile formed of 64 pixels isprocessed as a set of quads. Therefore, the dimensions of a tile of 64pixels may be, for example, any one of 32×2, 16×4, 8×8, 4×16 or 2×32pixels. Furthermore, in the data storage system used in the followingexamples, compressed data is stored in blocks of 16 bytes. Otherexamples may use a different tile size and compressed data may be storedin blocks formed of a different number of bytes.

FIG. 1B illustrates decompression of the data value (compressed usingthe scheme shown in FIG. 1A) implemented in a decompression unit. Thedecompression unit comprises a first decompression module 109, a seconddecompression module 110 and combining logic 114. The compressed data108 is split into a first compressed subset of bits 106 and a secondcompressed subsets 107. The decompression scheme is split intodecompression of the compressed MSBs (the major decompression scheme)and separately, decompression of the compressed LSBs (the minordecompression scheme). Decompression of the compressed MSBs is performedby the first decompression module 109. Decompression of the compressedLSBs is performed by the second decompression module 110.

As will be explained in more detail below, the major decompressionscheme and the minor decompression scheme may be executed serially. Forexample, as seen in FIG. 1B, decompression of the compressed LSBs by thesecond decompression module 110 may be performed after decompression ofthe compressed MSBs by the first decompression module 109. Decompressionof the compressed LSBs may require one or more MSBs as input. In otherexamples, decompression of the LSBs is independent of decompression ofthe MSBs. Decompression of the MSBs and decompression of the LSBs maytherefore happen concurrently.

Once the compressed first subset of bits 106 has been decompressed bythe first decompression module 109 resulting in a set of decompressedMSBs 111 and the second compressed subset of bits has been decompressedby the second decompression module resulting in a set of decompressedLSBs 112, the set of decompressed MSBs 111 and the set of decompressedLSBs 112 are combined by the combining logic 114 to determine the finaldecompressed data value 113.

As explained above, the data to be compressed in these examples is 10bpc (bit per channel) image data. The following examples consider twoformats of 10-bit data to be compressed. The minor compression anddecompression schemes described support both of these formats. The twoformats of 10-bit data are referred to as:

-   -   a) PACK16 data, otherwise known as unpacked data; and    -   b) PACK10 data, otherwise known as packed data.

PACK16

As shown in FIG. 2 , in this format, one 10-bit data value 201 is storedin a 16-bit halfword 202 with 6 bits of padding 203. A 32-bit wordtherefore stores two channels' worth of data for a single pixel, or onechannel's worth of data for two pixels.

As explained previously with respect to FIG. 1A, for each pixel perchannel of the image data, the pixel comprises 10 bits which are splitinto the 8 MSBs (most significant bits) and the 2 LSBs (leastsignificant bits) to undergo compression. The second least significantbit of the 10 bits is the most significant bit of the 2 LSBs. A quad isdefined as four adjacent pixels arranged in a 2×2 arrangement. Thus, foreach channel in a quad, the quad comprises 32 MSBs and 8 LSBs. Aspreviously described, compression of the MSBs is performed independentlyof compression of the LSBs. For compression of PACK16 data, as will bedescribed in more detail below, compression of the MSBs is at most a 50%compression rate in examples described herein. As mentioned above,compression rate is defined as the size of the compressed data as apercentage of the size of the uncompressed data. Compression of the LSBsis also at a 50% compression rate. At most 50% compression of the MSBsin the quad results in at most 16 bits representing the 32 MSBs. 50%compression of the LSBs in the quad results in 4 bits representing the 8LSBs. In other words, 8 LSBs are mapped onto a 4-bit encoding which isstored in memory.

As explained above, in the present examples, data is compressed in tilesformed of 64 pixels. For a tile formed of 64 pixels with data in thePACK16 format, where each pixel comprising two 10-bit data values or twopixels comprising one 10-bit data value each (i.e. 20 bits) is padded to32 bits, the total number of data bits stored in the tile is 1280 bitspadded to 2048 bits. A tile therefore comprises 160 bytes of data paddedto 256 bytes, where a byte is equal to 8 bits. Of the 160 data bytes forthe tile, the MSBs for the tile occupy 128 data bytes, and the LSBs forthe tile occupy 32 data bytes.

FIG. 3A shows the compression of a tile 301 formed of 160 bytes of datapadded to 256 bytes implemented in a compression unit. The compressionunit comprises a first compression module 104, a second compressionmodule 105 (which comprises mapping logic 309) and dividing logic 115,as described above with reference to FIG. 1 a . The tile is split by thedividing logic 115 into the MSBs 302 (128 bytes) and the LSBs 303 (32bytes), ignoring the 96 bytes of padding. Compression of the MSBs isperformed by the first compression module 104. Compression of the LSBsis performed by the second compression module 105. Compression of theMSBs (major MSB compression) is performed independently of compressionof the LSBs (minor LSB compression). Both the major and minorcompression schemes achieve a compression rate of at most 50%compression. In other words, both the first and second compressionmodules compress data with a at most 50% compression rate. At most 50%compression of the MSBs therefore results in at most 64 bytes ofcompressed data, the compressed MSBs 306. 50% compression of the LSBsresults in 16 bytes of compressed data, the compressed LSBs 307. In theexample shown in FIG. 3A, the compressed MSBs 306 and the compressedLSBs 307 are packed together as compressed data 308 and stored inmemory. The total compressed data 308 occupies 80 bytes. The tile datahas therefore been compressed from 160 bytes to 80 bytes. As explainedabove, in the data storage system used in the examples described herein,compressed data is stored in blocks of at most 16 bytes. For blocks of16 bytes, compressed data of the PACK16 format formed of 80 bytes isstored in 5 blocks of 16 bytes, and analogously for smaller block sizes.For blocks of more than 16 bytes, e.g., 32 bytes, padding of thecompressed data is required, e.g., 3 blocks of 32 bytes with 16 bytes ofpadding. In other examples, the compressed data may be stored in adifferent data format.

The compression scheme used by the second compression module 105 toperform 50% compression of the least significant bits of data in thePACK16 format is described below. This description uses an example ofdata for a single channel of a quad formed of four pixels arranged in a2×2 arrangement. The minor compression scheme which is used to compressthe least significant bits (LSBs) of data values for the quad uses twodifferent techniques for mapping the eight LSBs in the quad onto a 4-bitencoding. The first technique will be referred to as “9 to 10 bitreplication”. The second technique will be referred to as “Constant quadencoding”.

In other examples, in which the data to be compressed is not 10 bpcdata, and/or the data is not compressed in quads (groups of fourpixels), compression of the LSBs may not involve mapping onto a 4-bitencoding. Encodings comprising other numbers of bits may be usedinstead.

The bit replication 9 to 10 bits technique is used for the majority oftypes of quads. As previously described, for a single channel, a pixelis associated with a 10-bit data value comprising 2 LSBs. A quad formedof 2×2 pixels therefore comprises 8 LSBs. The method used in the firsttechnique to map the 8 LSBs for the quad onto a 4-bit encoding comprisesstoring four bits, each bit of the four bits being the second leastsignificant bit of each pixel. In other words, for each data value inthe quad, the second least significant bit is stored in memory.

The 9 to 10 bit replication technique is summarised in Table 1 below.

TABLE 1 Data value type Notation Format Bit indices Input 2 × 2 [A, B]Array of four e.g. quad [C, D] 10-bit values A = A[9:0] MSB = A[9](MSBs: A[9] to A[2]) LSB = A[0] (LSBs: A[1] and A[0]) Encoding E 4-bitvalue E = E[3:0] E[3:0] = A[1], B[1], C[1], D[1] Decompressed [A~, B~]Array of four e.g MSBs [C~, D~] 8-bit values A~ = A~[7:0] ~= A[9:2] MSB= A~[7] ~= A[9] LSB = A~[0] ~= A[2] Decompressed [A_, B_] Array of foure.g. LSBs [C_, D_] 2-bit values A_ = A_[1:0] = A_[1] A_[0] A_ = A[1]A[9] Output 2 × 2 [A′, B′] Array of four e.g. quad for PACK16 [C′, D′]10-bit values A′ = A′[9:0] = A~ A_ = A~[7:0] A_[1:0] format A′ = A~[7:0]A[1] A[9] ~= A[9:2] A[1] A[9]

As shown in Table 1, decompression of the 4-bit encoding to obtaindecompressed LSBs involves retrieving the second least significant bitfor each of the pixels in the quad (A[1], B[1], C[1], D[1]). Thetechnique further comprises, for each pixel in the quad, bit replicatingthe most significant bit of the pixel (A[9], B[9], C[9] or D[9]) andappending the bit-replicated MSB onto the second least significant bitof the respective pixel.

For the majority of quads being compressed and decompressed, the bitreplication 9 to 10 bits technique involving storing the second leastsignificant bit during compression and bit replicating the mostsignificant bit during decompression results in decompressed data whichaccurately represents the original uncompressed data. This techniqueenables all values between 0 and 1023 to be representable by thedecompressed data and therefore achieves aim 1 listed above. Thistechnique also ensures that the information in the 9^(th) bit isretained and thus achieves aim 2 listed above. The maximal square errorusing the bit replication 9 to 10 bits technique is 4, meaning that themaximal square error is reduced when compared with the “No bitreplication” and the “Bit replication 8 to 10 bits” schemes mentionedabove. The maximal square error is accurate assuming the compression ofthe first compression module itself introduces no error. The techniquetherefore achieves aims 1, 2 and 3 listed above.

However, since the information stored in the 10^(th) bit is lost duringdecompression, this “9 to 10 bit replication” technique cannot be usedto represent any changes in colour that are less than 2. In other words,the decompressed data cannot show subtle colour gradients of 1 andcannot perfectly represent all constant quads. A constant quad is a quadin which there are no changes of colour across the four pixels. In aconstant quad, the value of the two LSBs are the same for each of thepixels in the quad. In other words, all of the second subsets of bits inthe group are equal. Using 9 to 10 bit replication as described above,it is not possible to accurately represent all of the possible constantquads for all values of the MSBs (A[9]) across the quad. In regions ofthe image represented by the image data in which changes of colour arevery gradual, artefacts such as banding can occur when the 9 to 10 bitreplication technique is used. These artefacts are most noticeable inareas of gradual colour change. This technique therefore cannot be usedto perfectly represent 10-bit colour values in regions of low-frequency(at the chosen granularity) without creating artefacts. In other words,the 9 to 10 bit replication technique does not reliably achieve aim 4listed above. Therefore, for such regions of the image, a secondtechnique for compressing and decompressing the LSBs is used, which isreferred to herein as the “constant quad encoding” technique.

The four constant quads for the second subset of bits comprising twobits (two LSBs) are given in Table 2 below.

TABLE 2 Quad Two LSB bit values of the pixels of the quad [0, 0] 00 00[0, 0] 00 00 [1, 1] 01 01 [1, 1] 01 01 [2, 2] 10 10 [2, 2] 10 10 [3, 3]11 11 [3, 3] 11 11

Using the 9 to 10 bit replication technique described above, these quadswould be encoded as shown in Table 3 below.

TABLE 3 Decompressed LSBs Uncom- Uncom- When MSB = 0 Decom- When MSB = 1Decom- pressed pressed Encoding i.e. A[9] = B[9] = pressed Square i.e.A[9] = B[9] = pressed Square Quad LSBs E C[9] = D[9] = 0 quad error C[9]= D[9] = 1 quad error [0, 0] 00 00 0000 00 00 [0, 0] 0 01 01 [1, 1] 4[0, 0] 00 00 00 00 [0, 0] 01 01 [1, 1] [1, 1] 01 01 0000 00 00 [0, 0] 401 01 [1, 1] 0 [1, 1] 01 01 00 00 [0, 0] 01 01 [1, 1] [2, 2] 10 10 111110 10 [2, 2] 0 11 11 [3, 3] 4 [2, 2] 10 10 10 10 [2, 2] 11 11 [3, 3] [3,3] 11 11 1111 10 10 [2, 2] 4 11 11 [3, 3] 0 [3, 3] 11 11 10 10 [2, 2] 1111 [3, 3]

Table 3 shows that the constant quads cannot be represented accuratelyusing the 9 to 10 bit technique for all values of the MSB for each pixelin the quad. The square errors are accurate assuming the compression ofthe first compression module itself introduces no error. Table 3 showsexamples of the decompression of encodings in cases in which the MSB forall pixels in the quad=0 and the MSB for all pixels in the quad=1. Quadsfor which the MSB is not the same for each pixel in the quad representregions in the image straddling the middle value (e.g., quads containingone or more values at most 511 and one or more values at least 512 for10 bits), including regions of the image in which sharp value changesoccur(for example at edges/boundaries), and therefore form a less commonset of cases. The quads shown in the table, for which the MSB is equalfor each pixel in the quad, therefore form the more common set of cases,and in particular include the cases for which the pixel values areconstant (i.e. equal) across the quad. In other words, if the MSBs arenot equal then the pixel values in the quad are not equal so the quad isnot a “constant quad”. When the quad is not a “constant quad” becausethe MSBs are not equal, then the collective MSB can be given by a singleMSB, e.g., MSB=A[9], or by a Boolean expression of the MSBs, e.g.,MSB=(A[9]|D[9]) & (B[9]|C[9]). (A[9]|D[9]) & (B[9]|C[9]) can be replacedwith an alternative Boolean expression, for example (A[9] & D[9])|(B[9]& C[9]).

In order to consistently represent the four constant quads accuratelywhen the compressed LSBs are decompressed, each of the four constantquads is assigned to a four-bit encoding which can be decompressed toresult in the desired decompressed LSBs. In other words, four encodingsare chosen which are each associated with, and thereby represent, a quadin which the values of the LSBs are the same for each of the pixels inthe quad. Any four encodings may be chosen. The selected encodings arereferred to herein as redefined encodings. Using the constant quadencoding technique, each of these encodings will be decompressed toperfectly represent the respective constant quad.

The compression technique so far has been described with respect to theinput data being a quad comprising four adjacent pixels arranged in a2×2 arrangement, each pixel being associated with a data value, (thequad therefore having four data values). However, it will be appreciatedthat the techniques can be used for compressing an input group of m datavalues, where m can be any (positive) integer. The “constant quadencoding” technique can be used to compress the two least significantbits (LSBs) of each data value of a group of m data values. According tothe “constant quad encoding” technique, for an input group of m datavalues, the group of m data values is mapped onto an m-bit encodingwhere four of the possible encodings are the redefined encodings. Inexamples in which the input group of data values is a quad and comprisesfour data values, the redefined encodings are decompressed to representconstant quads. However, in other examples in which the input group hasm data values, each of the four redefined encodings can be decompressedto represent a constant region of m pixels (a region in which the valuesof the two LSBs are the same for each pixel in the region). In order torepresent such constant regions of m pixels accurately, the “constantquad encoding” technique is applied.

The technique comprises mapping the two least significant bits of eachof the data values in the input group of m data values collectively ontoan m-bit encoding and storing the m-bit encoding. The m-bit encoding isselected from 2^(m) possible m-bit encodings. The possible m-bitencodings are split into two groups of encodings. The first groupcomprises (2^(m)−4) m-bit encodings. If the selected encoding is anencoding from the first group of encodings then the selected encodingrepresents the two least significant bits for a representative group ofm data values in which the second least significant bit of each of thedata values is the same as a respective bit of the m-bit encoding.

When one of the (2^(m)−4) encodings is selected, it will be decompressedusing the 9 to 10 bit replication technique described above. In otherwords, during compression, if the m-bit encoding is from the first groupof encodings, the 2^(m) output bits representing the two leastsignificant bits of each of the m decompressed data values comprise: (i)a second least significant bit of each of the m decompressed data valueswhich is equal to a respective bit of the m-bit encoding, and (ii) aleast significant bit for each of the m decompressed data values.

The second group of encodings comprises four m-bit encodings. Aspreviously explained, these are the redefined encodings which will bedecompressed to represent a constant region of m pixels. Therefore, ifthe selected encoding is an encoding from the second group of encodings,then the selected encoding represents the two least significant bits fora representative group of m data values in which the two leastsignificant bits for each of the data values in the representative groupare equal to the two least significant bits of the other data values inthe representative group.

If the second least significant bit of each data value in the inputgroup of m data values is the same as the second least significant bitof the other data values in the input group of m data values, then thetwo least significant bits of each data value in the input group of mdata values are collectively mapped onto an encoding from the secondgroup of encodings. However, if it is not the case that the second leastsignificant bit of each data value in the input group of m data valuesis the same as the second least significant bit of the other data valuesin the input group of m data values, then the two least significant bitsof each data value in the input group of m data values are collectivelymapped onto an encoding from the first group of encodings.

Table 4 shows the selected redefined encodings according to an examplein which m=4. Table 4 shows decompression of encodings in the mostcommon cases in which the MSB for each pixel in the quad is equal.

TABLE 4 Decompressed LSBs using 9 to 10 bit replication DecompressedWhen When quad using A[9] = B[9] = Decompressed A[9] = B[9] =Decompressed constant quad Encoding C[9] = D[9] = 0 quad C[9] = D[9] = 1quad encoding 0000 00 00 [0, 0] 01 01 [1, 1] [0, 0] 00 00 [0, 0] 01 01[1, 1] [0, 0] 0110 00 10 [0, 2] 01 11 [1, 3] [1, 1] 10 00 [2, 0] 11 01[3, 1] [1, 1] 1111 10 10 [2, 2] 11 11 [3, 3] [2, 2] 10 10 [2, 2] 11 11[3, 3] [2, 2] 1001 10 00 [2, 0] 11 01 [3, 1] [3, 3] 00 10 [0, 2] 01 11[1, 3] [3, 3]

The four encodings 0000, 0110, 1111 and 1001 are chosen not to bedecompressed using the bit replication 9 to 10 bit technique asdescribed above. Instead, each of these encodings is associated with aconstant quad and decoded to perfectly represent that constant quad. Theselected four encodings are known as the redefined encodings, and areconsidered to be the “second group of encodings”, wherein the “firstgroup of encodings” includes the other twelve 4-bit encodings whichrepresent quads using the bit replication 9 to 10 bit technique. Theparticular encodings 0000 and 1111 shown in Table 4 are chosen as two ofthe redefined encodings because they are already decompressed asconstant quads in the bit replication 9 to 10 bits technique, and inparticular form a rotational/reflectional orbit of size one each. Theparticular encodings 0110 and 1001 shown in Table 4 are chosen as two ofthe redefined encodings because their decompressed quads in the bitreplication 9 to 10 bit technique form a rotational/reflectional orbitof size two. Selecting encodings which form a rotational/reflectionalorbit of size two as the redefined encodings ensures that visual lossdue to compression does not depend on the orientation of the tile beingcompressed. Therefore, this choice of redefined encodings means thatunexpected or surprising behaviour after rotation or reflection of thecompressed image data is avoided. In other examples, the redefinedencodings may be specified in a different order relative to the constantquad value, e.g., (0000, 0110, 1001, 1111), (1001, 0000, 0110, 1111),(1001, 0000, 1111, 0110). The first and third alternative orders benefitfrom the property that inverting the encoding bits inverts the bits ofthe decompressed quad, simplifying LUT design. In addition, thedecompressed quad may switch between the original order and the secondalternative order (or between the first and third alternative orders)based on the MSB of the 2×2 quad to share more logic with the 9 to 10bit technique. In other examples, different encodings may be chosen asthe redefined encodings.

As previously discussed, with the bit replication 9 to 10 bit techniqueit is not possible to accurately represent many quads which containcolour value changes of less than 2, e.g., quads having a maximal colourvalue change of 1. For quads with a maximal colour value change of 1,the second least significant bit of each pixel in the quad may be thesame as the second least significant bit of all other pixels in thequad, i.e. A[1]=B[1]=C[1]=D[1]. In other words, all four 9^(th) bits inthe quad are equal in this case. Since some of these quads cannot berepresented perfectly using the 9 to 10 bit replication technique, someof these quads are also assigned to one of the four redefined encodingswhich are decompressed to result in a constant quad, as shown above.

The 8 LSBs of a quad are mapped collectively onto a 4-bit encoding whichis selected from the 16 possible encodings. This mapping is performed bymapping logic 309 of the second compression module 105. The selectedencoding represents the LSBs of the quad with no greater error than anyof the other 4-bit encodings would represent the LSBs of the quad. Thedecision about which quads to assign to the redefined encodings isdecided using the following method.

Quads are assigned to the redefined encoding 0000 if:

-   -   1) All four 9^(th) bits in the quad are equal to zero        -   i.e. if A[1]=B[1]=C[1]=D[1]=0    -   and    -   2) Half or more of the four LSB values in the quad are zero.        This is determined using the Boolean expression seen in equation        (1).

(A[0]|D[0])&(B[0]|C[0])=0  (equation 1)

-   -   -   (A[0]|D[0]) & (B[0]|C[0]) can be replaced with an            alternative Boolean expression, for example        -   (A[0] & D[0])|(B[0]& C[0].

Quads are assigned to the redefined encoding 0110 if:

-   -   1) All four 9^(th) bits in the quad are equal to zero        -   i.e. if A[1]=B[1]=C[1]=D[1]=0    -   and        -   2) Half or more of the four LSB values in the quad are one.            This is determined using the Boolean expression seen in            equation (1b).

(A[0]|D[0])&(B[0]|C[0])=1  (equation 1b)

-   -   -   (A[0]|D[0]) & (B[0]|C[0]) can be replaced with an            alternative Boolean expression, for example        -   (A[0] & D[0])|(B[0]& C[0].

Quads are assigned to the redefined encoding 1111 if:

-   -   1) All four 9^(th) bits in the quad are equal to one        -   i.e. if A[1]=B[1]=C[1]=D[1]=1    -   and    -   2) Half or more of the four LSB values in the quad are zero.        This is determined using the Boolean expression seen in equation        (1).

(A[0]|D[0])&(B[0]|C[0])=0  (equation 1)

-   -   -   (A[0]|D[0]) & (B[0]|C[0]) can be replaced with an            alternative Boolean expression, for example        -   (A[0] & D[0])|(B[0]& C[0].

Quads are assigned to the redefined encoding 1001 if:

-   -   1) All four 9^(th) bits in the quad are equal to one        -   i.e. if A[1]=B[1]=C[1]=D[1]=1    -   and        -   2) Half or more of the four LSB values in the quad are one.            This is determined using the Boolean expression seen in            equation (1b).

(A[0]|D[0])&(B[0]|C[0])=1  (equation 1b)

-   -   -   (A[0]|D[0]) & (B[0]|C[0]) can be replaced with an            alternative Boolean expression, for example        -   (A[0] & D[0])|(B [0] & C[0].

Table 5 shows an example of a group of quads which contain colourchanges of less than 2. The quads in the group are assigned to one ofthe redefined encodings shown in Table 4 (in this example the quads areassigned to either the encoding 0000 which represents a constant quad

$\begin{bmatrix}0 & 0 \\0 & 0\end{bmatrix},$

or the encoding 0110 which represents a constant quad

$ \begin{bmatrix}1 & 1 \\1 & 1\end{bmatrix} ).$

In this example, the MSB for all pixels in the quad is zero i.e.A[9]=B[9]=C[9]=D[9]=0.

TABLE 5 Encoding Decompressed E using the LSBs using the first A[1] =(A[0] | Chosen redefined Uncom- Uncom- 9 to 10 bit technique where B[1]= D[0]) & encoding using Decom- pressed pressed replication A[9] = B[9]= Square C[1] = (B[0] | the constant quad pressed Square Quad LSBstechnique C[9] = D[9] = 0 Error D[1] C[0])= encoding technique quadError [0, 0] 00 0000 [0, 0] 0 True 0 0000 [0, 0] 0 [0, 0] 00 [0, 0] [0,0] 00 00 [0, 0] 00 0000 [0, 0] 1 True 0 0000 [0, 0] 1 [0, 1] 00 [0, 0][0, 0] 00 01 [0, 0] 00 0000 [0, 0] 1 True 0 0000 [0, 0] 1 [1, 0] 00 [0,0] [0, 0] 01 00 [0, 0] 00 0000 [0, 0] 2 True 1 0110 [1, 1] 2 [1, 1] 00[0, 0] [1, 1] 01 01 [0, 1] 00 0000 [0, 0] 1 True 0 0000 [0, 0] 1 [0, 0]01 [0, 0] [0, 0] 00 00 [0, 1] 00 0000 [0, 0] 2 True 1 0110 [1, 1] 2 [0,1] 01 [0, 0] [1, 1] 00 01 [0, 1] 00 0000 [0, 0] 2 True 0 0000 [0, 0] 2[1, 0] 01 [0, 0] [0, 0] 01 00 [0, 1] 00 0000 [0, 0] 3 True 1 0110 [1, 1]1 [1, 1] 01 [0, 0] [1, 1] 01 01 [1, 0] 01 0000 [0, 0] 1 True 0 0000 [0,0] 1 [0, 0] 00 [0, 0] [0, 0] 00 00 [1, 0] 01 0000 [0, 0] 2 True 0 0000[0, 0] 2 [0, 1] 00 [0, 0] [0, 0] 00 01 [1, 0] 01 0000 [0, 0] 2 True 10110 [1, 1] 2 [1, 0] 00 [0, 0] [1, 1] 01 00 [1, 0] 01 0000 [0, 0] 3 True1 0110 [1, 1] 1 [1, 1] 00 [0, 0] [1, 1] 01 01 [1, 1] 01 0000 [0, 0] 2True 1 0110 [1, 1] 2 [0, 0] 01 [0, 0] [1, 1] 00 00 [1, 1] 01 0000 [0, 0]3 True 1 0110 [1, 1] 1 [0, 1] 01 [0, 0] [1, 1] 00 01 [1, 1] 01 0000 [0,0] 3 True 1 0110 [1, 1] 1 [1, 0] 01 [0, 0] [1, 1] 01 00 [1, 1] 01 0000[0, 0] 4 True 1 0110 [1, 1] 0 [1, 1] 01 [0, 0] [1, 1] 01 01

Table 5 shows that these values are split up using the Booleanexpression in equation (1) such that a number of quads with colourvalues between

$\begin{matrix}\lbrack {0,0} \rbrack \\\lbrack {0,0} \rbrack\end{matrix}{and}\begin{matrix}\lbrack {1,1} \rbrack \\\lbrack {1,1} \rbrack\end{matrix}$

are encoded to represent the same quads as would be represented if theywere encoded using the 9 to 10 bit replication technique, and a numberof the quads are assigned to a different one of the redefined encodings.Using this method, approximately half of the quads will be decompressedas constant quad

$\underset{\lbrack{0,0}\rbrack}{\lbrack {0,0} \rbrack}$

and approximately half will be decompressed as constant quad

$\begin{matrix}\lbrack {1,1} \rbrack \\\lbrack {1,1} \rbrack\end{matrix},$

whereas using the 9 to 10 bit replication technique they would all havebeen decompressed as

$\begin{matrix}\lbrack {0,0} \rbrack \\\lbrack {0,0} \rbrack\end{matrix}.$

By roughly bisecting the six cases with 2 zeroes and 2 ones by applyingequation (1), the minimum average shift per pixel is minimised. As seenin Table 5 above, of the six cases, four are decompressed as

$\underset{\lbrack{1,1}\rbrack}{\lbrack {1,1} \rbrack}$

and two are decompressed as

$\begin{matrix}\lbrack {0,0} \rbrack \\\lbrack {0,0} \rbrack\end{matrix}.$

Treating all six cases with 2 zeroes and 2 ones identically, as achievedby the bit replication 9 to 10 bits technique, leads to an averageper-pixel shift of 0.5 units for such quads which is not insignificantand may be visually apparent on regions of a decompressed image, e.g.,as banding artefacts.

Furthermore, there are eight “special case” quads that also benefit (interms of square error) from reassignment to a constant quad rather thantheir encoding by the bit replication 9 to 10 bit technique, in certaincircumstances.

The first four of these eight special case quads are:

$\begin{matrix}\lbrack {1,1} \rbrack \\\lbrack {1,2} \rbrack\end{matrix},\begin{matrix}\lbrack {1,1} \rbrack \\\lbrack {2,1} \rbrack\end{matrix},{\begin{matrix}\lbrack {1,2} \rbrack \\\lbrack {1,1} \rbrack\end{matrix}{and}{\begin{matrix}\lbrack {2,1} \rbrack \\\lbrack {1,1} \rbrack\end{matrix}.}}$

All four of these quads are better represented as a constant quad ofones:

-   -   [1,1]    -   [1,1]        rather than as

$\begin{matrix}\lbrack {0,0} \rbrack \\\lbrack {0,2} \rbrack\end{matrix},\begin{matrix}\lbrack {0,0} \rbrack \\\lbrack {2,0} \rbrack\end{matrix},{\begin{matrix}\lbrack {0,2} \rbrack \\\lbrack {0,0} \rbrack\end{matrix}{or}\begin{matrix}\lbrack {2,0} \rbrack \\\lbrack {0,0} \rbrack\end{matrix}}$

respectively when the quad MSB is 0. The square error is reduced from 3to 1.

The second four of these eight special case quads are:

$\begin{matrix}\lbrack {2,2} \rbrack \\\lbrack {2,1} \rbrack\end{matrix},\begin{matrix}\lbrack {2,2} \rbrack \\\lbrack {1,2} \rbrack\end{matrix},{\begin{matrix}\lbrack {2,1} \rbrack \\\lbrack {2,2} \rbrack\end{matrix}{and}{\begin{matrix}\lbrack {1,2} \rbrack \\\lbrack {2,2} \rbrack\end{matrix}.}}$

All four of these quads are better represented as a constant quad oftwos:

-   -   [2,2]    -   [2,2]        rather than as

$\begin{matrix}\lbrack {3,3} \rbrack \\\lbrack {3,1} \rbrack\end{matrix},\begin{matrix}\lbrack {3,3} \rbrack \\\lbrack {1,3} \rbrack\end{matrix},{\begin{matrix}\lbrack {3,1} \rbrack \\\lbrack {3,3} \rbrack\end{matrix}{or}\begin{matrix}\lbrack {1,3} \rbrack \\\lbrack {3,3} \rbrack\end{matrix}}$

respectively when the quad MSB is 1. The square error is also reducedfrom 3 to 1.

Due to the assignment (redefinition) of the redefined encodings to thefour constant quads, a set of quads, which using the first bitreplication technique only (i.e. the 9 to 10 bit replication techniqueonly) would have been accurately represented using the selectedredefined encodings, can no longer be perfectly represented by anencoding. In other words, the constant quads have replaced this set ofquads in being the result of decompression of the selected redefinedencodings. This set of quads will be referred to as the “missing quads”herein. The missing quads are therefore given new encodings which, whendecompressed, will result in a set of quads which are similar to but notexactly the same as the missing quads. The missing quads are thereforerepresented imprecisely after decompression. These missing quads arechosen to be quads in which errors in LSBs are not as perceptuallynoticeable to a viewer as errors in LSBs of constant quads.

As seen in Table 4 above, the quads

$\begin{matrix}\lbrack {0,2} \rbrack \\\lbrack {2,0} \rbrack\end{matrix},\begin{matrix}\lbrack {2,0} \rbrack \\\lbrack {0,2} \rbrack\end{matrix},{\begin{matrix}\lbrack {1,3} \rbrack \\\lbrack {3,1} \rbrack\end{matrix}{or}\begin{matrix}\lbrack {3,1} \rbrack \\\lbrack {1,3} \rbrack\end{matrix}}$

are examples of missing quads that cannot be accurately represented inthe constant quad encoding technique, but are sometimes accuratelyrepresented in the 9 to 10 bit replication technique, depending on theMSB values in the quad. Several schemes can be used to assign thesemissing quads to new encodings such that when the new encodings aredecompressed, the missing quads are represented with quads which aremost similar to the missing quads out of the available options providedby the 16 possible encodings given by the 4-bit encodings of theconstant quad technique. Not only are the missing quads described aboveassigned new encodings, but so are all quads mapped to those missingquads upon decompression by the bit replication 9 to 10 bit technique.These other quads are also referred to as missing quads.

A first example of a scheme for assigning missing quads to one of theavailable encodings in the constant quad technique is given in Table 6below.

TABLE 6 A[9] = B[9] = A[9] = B[9] = A[9] = B[9] = A[9] = B[9] = C[9] =D[9] = 0 C[9] = D[9] = 1 C[9] = D[9] = 0 C[9] = D[9] = 1 Old Olddecompressed Old decompressed New New decompressed New decompressedencoding quad quad encoding quad quad 0110 [0, 2] [1, 3] 0111 [0, 2] [1,3] [2, 0] [3, 1] [2, 2] [3, 3] 1001 [2, 0] [3, 1] 1011 [2, 0] [3, 1] [0,2] [1, 3] [2, 2] [3, 3]

According to this scheme, missing quads which would have previously beenencoded as 0110 using the first technique (the 9 to 10 bit replicationtechnique) are now encoded as 0111. Missing quads which would havepreviously been encoded as 1001 using the first technique are nowencoded as 1011. Encodings 1110 and 1101 are alternative new encodingsfor the missing quads.

Therefore, the results of decompression of these new encodings (usingthe previously described 9 to 10 bit replication method) are quads whichare similar but not identical to the original uncompressed quads.Specifically, using this scheme, one pixel of the four pixels in thedecompressed quad has a value that differs by 1 bit from the value inthe original uncompressed quad. This first example scheme has a maximalsquare error of 12. The maximal square error necessarily increases, whencompared with the “Bit replication 9 to 10 bits” scheme mentioned above,due to reinterpreting missing quad encodings as constant quad encodings.The maximal square error is accurate assuming the compression of thefirst compression module itself introduces no error. Its benefitsinclude being simple and preserving an edge structure within the 2×2quad.

A second example of a scheme for assigning missing quads to one of theavailable encodings in the constant quad technique is shown in Table 7below.

TABLE 7 Example of Example of (A[0] | original original Old decompressedquad using first technique D[0]) & New uncompressed uncompressed OldA[9] = B[9] = Square A[9] = B[9] = Square (B[0] | New decompressedSquare quad bits encoding C[9] = D[9] = 0 error C[9] = D[9] = 1 errorC[0])= encoding quad error [0, 2] 00 10 0110 [0, 2] 0 [1, 3] 4 0 0110[1, 1] 4 [2, 0] 10 00 [2, 0] [3, 1] [1, 1] [2, 0] 10 00 1001 [2, 0] 0[3, 1] 4 0 0110 [1, 1] 4 [0, 2] 00 10 [0, 2] [1, 3] [1, 1] [1, 3] 01 110110 [0, 2] 4 [1, 3] 0 1 1111 [2, 2] 4 [3, 1] 11 01 [2, 0] [3, 1] [2, 2][3, 1] 11 01 1001 [2, 0] 4 [3, 1] 0 1 1111 [2, 2] 4 [1, 3] 01 11 [0, 2][1, 3] [2, 2]

In Table 7, the example of an original uncompressed quad listed in thefirst column represents 1 of 16 original uncompressed quads which wouldhave been compressed as the corresponding old encoding under the firsttechnique shown in the third column. According to this second scheme,quads which would have previously been encoded as 0110 or 1001 areencoded as either 0110 or 1111. As shown in Table 7, if the result ofthe Boolean expression (A[0]|D[0]) & (B[0]|C[0]) for the quad is zero(meaning that half or more of the four LSB values in the quad are zero),the quad is encoded as 0110. If the result of the expression is one(meaning that half or more of the four LSB values in the quad are one),the quad is encoded as 1111. Therefore, for each set of 16 originaluncompressed quads corresponding to a value of an old encoding of thefirst technique, the Boolean expression is used to split these sets of16 quads roughly in half. (A[0]|D[0]) & (B[0]|C[0]) can be replaced withan alternative Boolean expression, for example (A[0] & D[0])|(B[0] &C[0]).

As previously explained and shown in Table 4, the encodings 0110 and1111 have been reassigned so as to be decompressed as the constant quads

$\begin{matrix}\lbrack {1,1} \rbrack \\\lbrack {1,1} \rbrack\end{matrix}{and}\begin{matrix}\lbrack {2,2} \rbrack \\\lbrack {2,2} \rbrack\end{matrix}$

respectively. Therefore, according to this scheme, the missing quadswhich are encoded as 0110 or 1111 are decompressed as these constantquads.

For example, for the missing quad

$\begin{matrix}\lbrack {0,2} \rbrack \\\lbrack {2,0} \rbrack\end{matrix},$

(A[0]|D[0]) & (B[0]|C[0])=0 and so the quad is encoded as 0110 and isdecompressed as

$\begin{matrix}\lbrack {1,1} \rbrack \\\lbrack {1,1} \rbrack\end{matrix}.$

For the missing quad

$\begin{matrix}\lbrack {1,3} \rbrack \\\lbrack {3,1} \rbrack\end{matrix},$

(A[0]|D[0]) & (B[0]|C[0])=1 and so the quad is encoded as 1111 and isdecompressed as

$\begin{matrix}\lbrack {2,2} \rbrack \\\lbrack {2,2} \rbrack\end{matrix}.$

In these examples, the cumulative colour value across all four pixels inthe quad is the same before and after compression (but this is notnecessarily the case for all examples of uncompressed quads), howeverthe distribution of the colour value among the pixels is altered by thecompression and decompression. As per tables 3 and 4, Tables 6 and 7show the cases in which all MSBs in the quad are equal. This secondexample scheme has a maximal square error of 10. The maximal squareerror necessarily increases, when compared with the “Bit replication 9to 10 bits” scheme mentioned above, due to reinterpreting missing quadencodings as constant quad encodings. The maximal square error isaccurate assuming the compression of the first compression module itselfintroduces no error. Its benefits include being invariant under thesymmetry operations of rigid-body transforms applied to the 2×2 quad andhaving lower maximal square error than the first example scheme.

A third scheme may use a look-up table (LUT) to map each possible quadthat would have been mapped onto a quad in the 9 to 10 bit replicationtechnique that is a missing quad in the constant quad technique onto anencoding that is available in the constant quad technique. The LUT canbe manually designed so that each quad maps onto an available encodingwhich has the minimal error, and so this scheme may minimise the maximalsquare error. This third scheme has examples with a maximal square erroras low as 6. The maximal square error necessarily increases, whencompared with the “Bit replication 9 to 10 bits” scheme mentioned above,due to reinterpreting missing quad encodings as constant quad encodings.The maximal square error is accurate assuming the compression of thefirst compression module itself introduces no error. Its benefitsinclude examples preserving an edge structure within the 2×2 quad andexamples minimising the maximal square error.

FIG. 3B illustrates an example of decompression of data which has beencompressed in the manner shown in FIG. 3A, i.e. using one or both of the“9 to 10 bit replication” and “Constant quad encoding” techniques. Thedecompression illustrated in FIG. 3B is implemented in a decompressionunit. The decompression unit comprises a first decompression module 109,a second decompression module 110 and combining logic 114, as describedabove with reference to FIG. 1B. As per the result of the compressionseen in FIG. 3A, the compressed data 308 occupies 80 bytes. Prior todecompression, the compressed data 308 is split into the compressed MSBs306 (64 bytes) and the compressed LSBs 307 (16 bytes). Decompression ofthe compressed MSBs 306 is performed by the first decompression module109. Decompression of the compressed LSBs 307 is performed by the secondcompression module 110. As shown in FIG. 3B, decompression of thecompressed MSBs is performed before decompression of the compressedLSBs. As will be explained in more detail below, the seconddecompression module 110 may use at least part of the output of thefirst decompression module 109 to decompress the compressed LSBs. Boththe major and minor decompression schemes used by the first and seconddecompression modules, respectively, achieve a decompression rate of atleast 200% decompression. The decompression rate is defined herein asthe size of the decompressed data as a percentage of the size of thecompressed data, so if compressed data is decompressed with adecompression rate of “at least” 200% decompression this means that itsdecompressed size is 200%, or more than 200%, of its compressed size. Inother words, both the first and second compression modules decompressdata with at least a 200% decompression rate. At least 200%decompression of the compressed MSBs 306 results in 128 bytes ofdecompressed data, the decompressed MSBs 311. Exactly 200% decompressionof the compressed LSBs 307 results in 32 bytes of compressed data, thedecompressed LSBs 312. The combining logic 114 combines the decompressedMSBs and the decompressed LSBs, and the result of the decompressed MSBsand the LSBs is the decompressed data 313. The total decompressed data313 occupies 160 bytes.

In the example seen in FIG. 3B, decompression of the compressed MSBs 306is performed after decompression of the compressed LSBs 307. The majordecompression scheme and the minor decompression scheme are executedserially. As previously mentioned, the details of compression anddecompression of the MSBs are not discussed in this application. Anyexisting decompression scheme with at least a 200% decompression ratemay be used by the first compression module. The decompression schemeused by the second decompression module 110 to perform 200%decompression of the least significant bits of data in the PACK16 formatis described below. This description uses an example of data for asingle channel of a quad formed of four pixels arranged in a 2×2arrangement.

As explained above, for a quad comprising 2×2 pixels, duringcompression, the 8 LSBs for the quad are compressed into a 4-bitencoding in the stored compressed data. During decompression of thecompressed LSBs, the 4-bit encoding is converted back to 8 output bitsby mapping the encoding to the output bits using the mapping logic 310of the second decompression module 110. In other words, the minordecompression scheme has a decompression rate of 200%. After compressionusing the 9 to 10 bit replication technique described above, each bit ofthe 4-bit encoding used to represent the LSBs is the second leastsignificant bit from one pixel in the quad. As shown in Table 1,decompression for a pixel in the quad comprises retrieving the storedsecond least significant bit for that pixel. Decompression of that pixelfurther comprises retrieving the MSB (A[9], B[9], C[9] or D[9]) whichcorresponds to that pixel. Decompression of the LSBs therefore uses atleast some of the output of decompression of the MSBs. The MSB for apixel which is a result of decompression of the MSBs by the firstdecompression module 109 is thus input into the second decompressionmodule 110, as shown in FIG. 3B. Finally, for each pixel in the quad,decompression of the compressed LSBs comprises bit replicating the MSBfor the pixel (A[9], B[9], C[9] or D[9]) and appending the result ontothe respective second least significant bit of the pixel (A[1], B[1],C[1], or D[1]).

After compression using the constant quad encoding technique describedabove, the bits of the 4-bit encoding may not be the second leastsignificant bit of each pixel in the quad. The 4-bit encoding may be oneof the chosen redefined encodings. In the present example, theseencodings are 0000, 0110, 1111 and 1001. Decompression of theseencodings is therefore not performed using the 9 to 10 bit replicationdecompression technique. As previously explained, these encodings haveeach been associated with a constant quad such that they aredecompressed to result in the respective constant quad, as seen in Table4. Decompression of the LSBs therefore does not use any of the output ofdecompression of the MSBs. The MSB for a pixel which is a result ofdecompression of the MSBs by the first decompression module 109 need notbe input into the second decompression module 110 in these cases, whichmay provide a minor power saving.

By redefining a select number of encodings so that when decompressedthey perfectly represent all of the possible constant quads, a number ofmissing quads are no longer perfectly representable. However, it hasbeen observed that it is more important to accurately represent areas ofconstant colour, i.e. low frequency regions, rather than other quads ofthe image which are of higher spatial frequency. Specifically, it hasbeen found that the differences between uncompressed and decompressedmissing quads are not as noticeable as artefacts in constant quads. Thedownside of not being able to perfectly represent the missing quads(which additionally leads to an increased maximal square error) istherefore outweighed by the benefit in being able to perfectly representall of the constant quads (which additionally may lead to a decreasedmean square error). It is therefore a worthwhile trade off.

Utilising constant quad encoding alongside 9 to 10 bit replication meansthat all of the constant quads can be perfectly encoded. Thus, byredefining a number of encodings to perfectly represent constant quads,the minor compression scheme is able to achieve aim 4 listed above. Inother words, the minor compression scheme for PACK16 data achieves aims1 to 4 and thus produces high quality image data after decompression.

PACK10

The compression and decompression techniques described previously withregard to data in a PACK16 format can equally be used for data in thePACK10 format, although its use for PACK10 data does not benefit fromthe same utilisation of additional padding bits. Further techniques havetherefore been developed for compression and decompression of data inthe PACK10 format as summarised in Table 8 and described below.

TABLE 8 Data value type Notation Format Bit indices Input 2 × 2 quad [A,B] Array of four e.g. [C, D] 10-bit values A = A[9:0] MSB = A[9] (MSBs:A[9] to A[2]) LSB = A[0] (LSBs: A[1] and A[0]) PACK10 E Five bit value E= E[4:0] Encoding E[3:0] = A[1], B[1], C[1], D[1] E[4] = (A[0] | D[0]) &(B[0] | C[0]) Decompressed [A~, B~] Array of four e.g. MSBs [C~, D~]8-bit values A~ = A~[7:0] ~= A[9:2] MSB = A~[7] ~= A[9] LSB = A~[0] ~=A[2] Decompressed [A_, B_] Array of four e.g. LSBs [C_, D_] 2-bit valuesA_ = A_[1:0] = A_[1] A_[0] A_ = A[1] E[4] = A[1] [(A[0] | D[0]) & (B[0]| C[0])] Output 2 × 2 [A′, B′] Array of four e.g. quad for [C′, D′]10-bit values A′ = A′[9:0] = A~ A_ = A~[7:0] A_[1:0] PACK10 A′ = A~[7:0]A[1] [(A[0] | D[0]) & format (B[0] | C[0])] ~= A[9:2] A[1] [(A[0] |D[0]) & (B[0] | C[0])]

As seen in FIG. 4 , in the PACK10 format, three 10-bit data values 401a, 401 b and 401 c are stored in each 32-bit word 402 with 2 bits ofpadding 403. As previously described, each 10-bit data value is dividedinto a first subset of bits comprising the 8 MSBs and a second subset ofbits comprising the 2 LSBs. A 32-bit word may correspond to a singlepixel storing 3 channels' worth of data, or 3 pixels storing 1 channel'sworth of data each. For a tile formed of 64 pixels of 3 channels or 192pixels of 1 channel, each pixel or set of 3 pixels being associated withthree 10-bit data values (i.e. 30 bits padded to 32 bits), the totalnumber of data bits stored in the tile is 1920 bits padded to 2048 bits.A tile therefore comprises 240 bytes of data padded to 256 bytes. Of the240 data bytes for the tile, the MSBs for the tile occupy 192 databytes. The LSBs for the tile occupy 48 data bytes.

FIG. 5A shows the compression of a tile 501 formed of 240 bytes of datapadded to 256 bytes implemented in a compression unit. The compressionunit comprises a first compression module 104, a second compressionmodule 105 (which comprises mapping logic 309) and dividing logic 115,as described above with reference to FIG. 1 a . The tile is split by thedividing logic 115 into the MSBs 502 (192 bytes) and the LSBs 503 (48bytes), ignoring the 16 bytes of padding. Compression of the MSBs isperformed by the first compression module 104. Compression of the LSBsis performed by the second compression module 105. Compression of theMSBs (major MSB compression) is performed independently of compressionof the LSBs (minor LSB compression). The major compression schemeachieves a compression rate of at most 50% compression. As previouslydiscussed, any compression scheme with a compression rate of at most 50%may be used. At most 50% compression of the MSBs therefore results in atmost 96 bytes of compressed data, the compressed MSBs 506.

As will be explained in more detail below, compression of the LSBs bythe second compression module is performed at a rate of no more than66.66 . . . %.Thus compression of the LSBs results in compressed LSBs507 occupying no more than 32 bytes.

In the example shown in FIG. 5A, the compressed MSBs 506 and thecompressed LSBs 507 are packed together as compressed data 508 andstored in memory. The total compressed data 508 occupies 128 bytes. Thetile data has therefore been compressed from 240 bytes to 128 bytes. Theoverall compression of data in the PACK10 format due to the majorcompression schemes and minor compression schemes is therefore at acompression rate of 53.33% compression.

As explained above, in the data storage system used in the examplesdescribed herein, compressed data is stored in blocks of at least 16bytes. Compressed data of the PACK10 format formed of 128 bytes istherefore stored in 8 blocks of 16 bytes. In other examples, thecompressed data may be stored in a different data format. Compression ofthe LSBs at a rate of 66.66 . . . % means that additional storage spacein the blocks of at least 16 bytes in the data storage system areutilised. If, instead, the LSBs were compressed at a rate of 50%,storage space in the at least 16 byte blocks would be wasted and wouldbe filled with padding (e.g., 8 bytes' worth). Rather than stickingstrictly to a 50% compression rate and then padding the resulting datato fit into an integer number of 16-byte blocks, a higher compressionrate can be used (with less padding) and the compressed data can stillfit into the same integer number of 16-byte blocks.

The compression scheme used by the second compression module 105 toperform compression of the least significant bits of data in the PACK10format is described below. This description uses an example of data fora single channel of a quad formed of four pixels arranged in a 2×2arrangement.

For each channel in a quad, which is defined as 2×2 pixels, the quadcomprises 32 MSBs and 8 LSBs. Following at most 50% compression of theMSBs, the compressed data for the quad comprises 16 bits (possiblypadded) representing the 32 MSBs. Compression of the 8 LSBs at a rate of66.66 . . . % results in the compressed data for the quad comprising5.33 . . . bits representing the 8 LSBs. For a single quad, this isrounded down such that 8 LSBs are compressed into a 5-bit encoding inthe stored compressed data. Compressing the 8 LSBs into 5 bitsrepresents a 62.5% compression of the LSBs.

As previously described, for a single channel, a pixel is associatedwith a 10-bit data value comprising 8 MSBs and 2 LSBs. As seen in FIG. 4, across 3 channels or 1 channel across 3 pixels, a pixel or set of 3pixels is associated with 30 bits of data which are stored alongside 2bits of padding in a 32 bit word. Thus, for each channel, a quadcomprising 2×2 pixels comprises 32 MSBs and 8 LSBs. The method used tocompress the 8 LSBs across a quad, where each pixel comprises 2 LSBs,comprises mapping the 8 LSBs onto a 5-bit encoding to be stored. Themapping of the 8 LSBs onto a 5-bit encoding is performed by the mappinglogic 309 of the second decompression module 105. The method for mappingthe 8 LSBs onto a 5-bit encoding is illustrated in Table 8 above. The5-bit encoding comprises the second least significant bit from eachpixel in the quad (totaling 4 bits) and one bit which is indicative of aleast significant bit for the four pixels. In other words, the 5-bitencoding comprises the second least significant bit of each 10-bit datavalue in the quad and an additional bit indicative of the leastsignificant bit of the four data values. The additional bit iseffectively shared by all four values in the quad.

The method for compressing the 8 LSBs thus comprises storing four bits,each of the four bits being the second least significant bit of a pixelin the quad. The method further comprises storing a fifth bit which isindicative of a least significant bit for the four pixels. The fifth bitis calculated by applying the same Boolean expression used in equation(1) above to the least significant bits of the four pixels in the quadas below.

-   -   When (A[0]|D[0]) & (B[0]|C[0])=1, the fifth bit=1    -   When (A[0]|D[0]) & (B[0]|C[0])=0, the fifth bit=0

As previously explained with respect to the PACK16 data format,(A[0]|D[0]) & (B[0]|C[0])=1 means that half or more of the four LSBvalues in the quad are 1. (A[0]|D[0]) & (B[0]|C[0])=0 means that half ormore of the four LSB values in the quad are 0. (A[0]|D[0]) & (B[0]|C[0])can be replaced with an alternative Boolean expression, for example(A[0] & D[0])|(B[0] & C[0]).

FIG. 5B illustrates an example of decompression of data which has beencompressed in the manner shown in FIG. 5A. The decompression of the datais implemented in a decompression unit. The decompression unit comprisesa first decompression module 109, a second compression module 110 (whichcomprises mapping logic 310) and combining logic 114, as described abovewith reference to FIG. 1B. As per the result of the compression seen inFIG. 5A, the compressed data 508 occupies 128 bytes. The compressed data508 is split into the compressed MSBs 506 (96 bytes) and the compressedLSBs 507 (32 bytes). Decompression of the compressed MSBs 506 isperformed by the first decompression module 109. Decompression of thecompressed LSBs 507 is performed by the second compression module 110.As shown in FIG. 5B, decompression of the compressed MSBs may beperformed at the same time as decompression of the compressed LSBs. Themajor decompression scheme and the minor decompression scheme may beexecuted concurrently. As will be explained in more detail below, thisis because the second decompression module 110 does not need to use theoutput of the first decompression module 109 to decompress thecompressed LSBs.

The major decompression scheme achieves a decompression rate of at least200% decompression. As mentioned above, the decompression rate isdefined as the size of the decompressed data as a percentage of the sizeof the compressed data. As previously mentioned, the details ofcompression and decompression of the MSBs is not discussed in thisapplication. Any decompression scheme with at least a 200% decompressionrate may be used. At least 200% decompression of the MSBs results in 192bytes of decompressed MSBs 511. As will be explained in more detailbelow, decompression of the LSBs by the second compression module 110 isperformed at a decompression rate of 160%. Thus decompression of theLSBs results in decompressed LSBs 512 occupying 48 bytes. The combininglogic 114 combines the decompressed MSBs 511 with the decompressed LSBs512 to determine the decompressed data 513. The total decompressed data513 occupies 240 bytes.

The compression scheme used by the second decompression module 110 toperform 160% decompression of the least significant bits of data in thePACK10 format is described below. This description uses an example ofdata for a single channel of a quad formed of four pixels arranged in a2×2 arrangement. As explained above, for a quad comprising 2×2 pixels,during compression, the 8 LSBs for the quad are compressed into a 5-bitencoding in the stored compressed data. During decompression of thecompressed LSBs, the 5-bit encoding is converted back to 8 output bits.In other words, the minor decompression scheme decompresses thecompressed LSBs at a rate of 160%.

The method of decompression of PACK10 data is shown in Table 8 above.Decompression of the 5-bit encoding is performed by the mapping logic310 of the second decompression module 110 and comprises retrieving fromthe encoding the four bits which are the second least significant bit ofeach pixel in the quad. Decompression further comprises retrieving thefifth bit which is indicative of a least significant bit for the fourpixels, the fifth bit having been calculated using a Boolean expressionof the least significant bits of the four pixels in the quad. Finally,for each pixel in the quad, decompression of the compressed LSBscomprises appending the bit indicative of the least significant bit forthe four pixels onto the respective second least significant bit of thepixel. Decompression of the LBSs by the second compression module 110therefore uses data stored as the compressed LSBs. The secondcompression module does not require any input from the output of thefirst compression module 109 (any of the decompressed MSBs).

This compression and decompression scheme for data in the PACK10 formatenables all values between 0 and 1023 to be representable by thedecompressed data and therefore achieves aim 1 listed above. Thistechnique also ensures that the information in the 9th bit (the secondleast significant bit) is retained and thus achieves aim 2 listed above.The maximal square error using this technique is 2, meaning that themaximal square error is reduced when compared with the “No bitreplication” and the “Bit replication 8 to 10 bits” schemes mentionedabove and therefore aim 3 listed above is achieved. The maximal squareerror is accurate assuming the compression of the first compressionmodule itself introduces no error. Furthermore, constant quads can beaccurately represented using this compression and decompression scheme(because all of the pixel values of a constant quad have the same10^(th) bit, which will be stored as the fifth bit of the encoding) andtherefore aim 4 listed above is achieved. The technique thereforeachieves aims 1 to 4 listed above.

In addition, this method of compressing and decompressing data in thePACK10 format makes use of most of the extra bits available in thePACK10 format. Accordingly this technique allows additional informationto be stored during compression meaning that the resulting data afterdecompression more accurately represents the original uncompressed data.The PACK10 technique described herein therefore allows high qualityimages to be obtained following decompression.

FIG. 6 shows a graphics processing unit 601. The graphics processingunit 601 may be configured to perform any of the methods describedherein. The graphics processing unit comprises a compression unit 602, amemory 603 and a decompression unit 604. The compression unit 602comprises the first compression module 104, second compression module105 and dividing logic 115 previously described and shown in FIGS. 1A,3A and 5A. The decompression unit 604 comprises the first decompressionmodule 109, the second decompression module 110 and the combining logic114 previously described and shown in FIGS. 1B, 3B and 5B.

When data 605 is input into the compression unit 602 from a location inthe GPU, the compression unit 602 compresses the input data 605 usingthe dividing logic 115, the first compression module 104 and secondcompression 105 using one or more of the compression methods previouslydescribed. Compressed data 606 is output from the compression unit 602and stored in memory 603.

Compressed data 606 is output from the memory 603 and input into thedecompression unit 604. The decompression unit 604 decompresses thecompressed data 606 using the first decompression module 109, seconddecompression module 110 and the combining logic 114 using one or moreof the decompression methods previously described. Decompressed data 607is output from the decompression unit 604 to another location in theGPU.

In the example seen in FIG. 6 , the compression unit 602 anddecompression 604 are separate units, but in other examples may becombined into a single unit. Similarly, any two or more of the firstcompression module 104, the second compression module 105, the firstdecompression module 109 and the second decompression module 110 may becombined into a single unit.

FIG. 7 shows a computer system in which the compression unit and/ordecompression unit described herein may be implemented. The computersystem comprises a CPU 702, a GPU 704, a memory 706 and other devices714, such as a display 716, speakers 718 and a camera 722. A processingblock 710 is implemented on the GPU 704, as well as a Neural NetworkAccelerator (NNA) 711. The components of the computer system cancommunicate with each other via a communications bus 720. A store 712(corresponding to memory 603 is implemented as part of the memory 706.

While FIG. 7 illustrates one implementation of a graphics processingunit, it will be understood that a similar block diagram could be drawnfor an artificial intelligence accelerator system—for example, byreplacing either the CPU 702 or the GPU 704 with a Neural NetworkAccelerator (NNA) 711, or by adding the NNA as a separate unit. In suchcases, again, the processing block 710 can be implemented in the NNA.

The graphics processing unit of FIG. 6 is shown as comprising a numberof functional blocks. This is schematic only and is not intended todefine a strict division between different logic elements of suchentities. Each functional block may be provided in any suitable manner.It is to be understood that intermediate values described herein asbeing formed by a graphics processing unit need not be physicallygenerated by the graphics processing unit at any point and may merelyrepresent logical values which conveniently describe the processingperformed by the graphics processing unit between its input and output.

The compression units and decompression units described herein may beembodied in hardware on an integrated circuit. The compression units anddecompression units described herein may be configured to perform any ofthe methods described herein. Generally, any of the functions, methods,techniques or components described above can be implemented in software,firmware, hardware (e.g., fixed logic circuitry), or any combinationthereof. The terms “module,” “functionality,” “component”, “element”,“unit”, “block” and “logic” may be used herein to generally representsoftware, firmware, hardware, or any combination thereof. In the case ofa software implementation, the module, functionality, component,element, unit, block or logic represents program code that performs thespecified tasks when executed on a processor. The algorithms and methodsdescribed herein could be performed by one or more processors executingcode that causes the processor(s) to perform the algorithms/methods.Examples of a computer-readable storage medium include a random-accessmemory (RAM), read-only memory (ROM), an optical disc, flash memory,hard disk memory, and other memory devices that may use magnetic,optical, and other techniques to store instructions or other data andthat can be accessed by a machine.

The terms computer program code and computer readable instructions asused herein refer to any kind of executable code for processors,including code expressed in a machine language, an interpreted languageor a scripting language. Executable code includes binary code, machinecode, bytecode, code defining an integrated circuit (such as a hardwaredescription language or netlist), and code expressed in a programminglanguage code such as C, Java or OpenCL. Executable code may be, forexample, any kind of software, firmware, script, module or librarywhich, when suitably executed, processed, interpreted, compiled,executed at a virtual machine or other software environment, cause aprocessor of the computer system at which the executable code issupported to perform the tasks specified by the code.

A processor, computer, or computer system may be any kind of device,machine or dedicated circuit, or collection or portion thereof, withprocessing capability such that it can execute instructions. A processormay be or comprise any kind of general purpose or dedicated processor,such as a CPU, GPU, NNA, System-on-chip, state machine, media processor,an application-specific integrated circuit (ASIC), a programmable logicarray, a field-programmable gate array (FPGA), or the like. A computeror computer system may comprise one or more processors.

It is also intended to encompass software which defines a configurationof hardware as described herein, such as HDL (hardware descriptionlanguage) software, as is used for designing integrated circuits, or forconfiguring programmable chips, to carry out desired functions. That is,there may be provided a computer readable storage medium having encodedthereon computer readable program code in the form of an integratedcircuit definition dataset that when processed (i.e. run) in anintegrated circuit manufacturing system configures the system tomanufacture a compression unit or a decompression unit configured toperform any of the methods described herein, or to manufacture acompression unit or a decompression unit comprising any apparatusdescribed herein. An integrated circuit definition dataset may be, forexample, an integrated circuit description.

Therefore, there may be provided a method of manufacturing, at anintegrated circuit manufacturing system, compression unit or adecompression unit as described herein. Furthermore, there may beprovided an integrated circuit definition dataset that, when processedin an integrated circuit manufacturing system, causes the method ofmanufacturing a compression unit or a decompression unit to beperformed.

An integrated circuit definition dataset may be in the form of computercode, for example as a netlist, code for configuring a programmablechip, as a hardware description language defining hardware suitable formanufacture in an integrated circuit at any level, including as registertransfer level (RTL) code, as high-level circuit representations such asVerilog or VHDL, and as low-level circuit representations such as OASIS(RTM) and GDSII. Higher level representations which logically definehardware suitable for manufacture in an integrated circuit (such as RTL)may be processed at a computer system configured for generating amanufacturing definition of an integrated circuit in the context of asoftware environment comprising definitions of circuit elements andrules for combining those elements in order to generate themanufacturing definition of an integrated circuit so defined by therepresentation. As is typically the case with software executing at acomputer system so as to define a machine, one or more intermediate usersteps (e.g. providing commands, variables etc.) may be required in orderfor a computer system configured for generating a manufacturingdefinition of an integrated circuit to execute code defining anintegrated circuit so as to generate the manufacturing definition ofthat integrated circuit.

An example of processing an integrated circuit definition dataset at anintegrated circuit manufacturing system so as to configure the system tomanufacture a compression unit or a decompression unit will now bedescribed with respect to FIG. 8 .

FIG. 8 shows an example of an integrated circuit (IC) manufacturingsystem 802 which is configured to manufacture a compression unit or adecompression unit as described in any of the examples herein. Inparticular, the IC manufacturing system 802 comprises a layoutprocessing system 804 and an integrated circuit generation system 806.The IC manufacturing system 802 is configured to receive an ICdefinition dataset (e.g. defining a graphics processing unit asdescribed in any of the examples herein), process the IC definitiondataset, and generate an IC according to the IC definition dataset (e.g.which embodies a compression unit or a decompression unit as describedin any of the examples herein). The processing of the IC definitiondataset configures the IC manufacturing system 802 to manufacture anintegrated circuit embodying a compression unit or a decompression unitas described in any of the examples herein.

The layout processing system 804 is configured to receive and processthe IC definition dataset to determine a circuit layout. Methods ofdetermining a circuit layout from an IC definition dataset are known inthe art, and for example may involve synthesising RTL code to determinea gate level representation of a circuit to be generated, e.g. in termsof logical components (e.g. NAND, NOR, AND, OR, MUX and FLIP-FLOPcomponents). A circuit layout can be determined from the gate levelrepresentation of the circuit by determining positional information forthe logical components. This may be done automatically or with userinvolvement in order to optimise the circuit layout. When the layoutprocessing system 804 has determined the circuit layout it may output acircuit layout definition to the IC generation system 806. A circuitlayout definition may be, for example, a circuit layout description.

The IC generation system 806 generates an IC according to the circuitlayout definition, as is known in the art. For example, the ICgeneration system 806 may implement a semiconductor device fabricationprocess to generate the IC, which may involve a multiple-step sequenceof photo lithographic and chemical processing steps during whichelectronic circuits are gradually created on a wafer made ofsemiconducting material. The circuit layout definition may be in theform of a mask which can be used in a lithographic process forgenerating an IC according to the circuit definition. Alternatively, thecircuit layout definition provided to the IC generation system 806 maybe in the form of computer-readable code which the IC generation system806 can use to form a suitable mask for use in generating an IC.

The different processes performed by the IC manufacturing system 802 maybe implemented all in one location, e.g. by one party. Alternatively,the IC manufacturing system 802 may be a distributed system such thatsome of the processes may be performed at different locations, and maybe performed by different parties. For example, some of the stages of:(i) synthesising RTL code representing the IC definition dataset to forma gate level representation of a circuit to be generated, (ii)generating a circuit layout based on the gate level representation,(iii) forming a mask in accordance with the circuit layout, and (iv)fabricating an integrated circuit using the mask, may be performed indifferent locations and/or by different parties.

In other examples, processing of the integrated circuit definitiondataset at an integrated circuit manufacturing system may configure thesystem to manufacture a compression unit or a decompression unit withoutthe IC definition dataset being processed so as to determine a circuitlayout. For instance, an integrated circuit definition dataset maydefine the configuration of a reconfigurable processor, such as an FPGA,and the processing of that dataset may configure an IC manufacturingsystem to generate a reconfigurable processor having that definedconfiguration (e.g. by loading configuration data to the FPGA).

In some embodiments, an integrated circuit manufacturing definitiondataset, when processed in an integrated circuit manufacturing system,may cause an integrated circuit manufacturing system to generate adevice as described herein. For example, the configuration of anintegrated circuit manufacturing system in the manner described abovewith respect to FIG. 8 by an integrated circuit manufacturing definitiondataset may cause a device as described herein to be manufactured.

In some examples, an integrated circuit definition dataset could includesoftware which runs on hardware defined at the dataset or in combinationwith hardware defined at the dataset. In the example shown in FIG. 8 ,the IC generation system may further be configured by an integratedcircuit definition dataset to, on manufacturing an integrated circuit,load firmware onto that integrated circuit in accordance with programcode defined at the integrated circuit definition dataset or otherwiseprovide program code with the integrated circuit for use with theintegrated circuit.

The implementation of concepts set forth in this application in devices,apparatus, modules, and/or systems (as well as in methods implementedherein) may give rise to performance improvements when compared withknown implementations. The performance improvements may include one ormore of increased computational performance, reduced latency, increasedthroughput, and/or reduced power consumption. During manufacture of suchdevices, apparatus, modules, and systems (e.g. in integrated circuits)performance improvements can be traded-off against the physicalimplementation, thereby improving the method of manufacture. Forexample, a performance improvement may be traded against layout area,thereby matching the performance of a known implementation but usingless silicon. This may be done, for example, by reusing functionalblocks in a serialised fashion or sharing functional blocks betweenelements of the devices, apparatus, modules and/or systems. Conversely,concepts set forth in this application that give rise to improvements inthe physical implementation of the devices, apparatus, modules, andsystems (such as reduced silicon area) may be traded for improvedperformance. This may be done, for example, by manufacturing multipleinstances of a module within a predefined area budget.

The applicant hereby discloses in isolation each individual featuredescribed herein and any combination of two or more such features, tothe extent that such features or combinations are capable of beingcarried out based on the present specification as a whole in the lightof the common general knowledge of a person skilled in the art,irrespective of whether such features or combinations of features solveany problems disclosed herein. In view of the foregoing description itwill be evident to a person skilled in the art that variousmodifications may be made within the scope of the invention.

What is claimed is:
 1. A computer-implemented method for compressing ann-bit data value, the method comprising: dividing the n bits of the datavalue into a first subset of bits and a second subset of bits, the firstsubset comprising the n−2 most significant bits of the data value andthe second subset comprising the two least significant bits of the datavalue; performing compression of the first subset using a firstcompression module; and performing compression of the second subsetusing a second compression module, the first and second compressionmodules implementing different compression schemes.
 2. The methodaccording to claim 1, wherein (n−2)=2^(x), wherein x is an integer. 3.The method according to claim 1, wherein n=6 or n=10 or n=18.
 4. Themethod according to claim 1, wherein the first compression modulecompresses the first subset by 50%.
 5. The method according claim 1,wherein the compression of the first subset is independent of thecompression of the second subset.
 6. The method according to claim 1,wherein the data value represents image data.
 7. The method according toclaim 1, further comprising storing in memory the result of thecompression of the first subset and the result of the compression of thesecond subset.
 8. The method according to claim 1, wherein the secondcompression module compresses the second subset by at least 62.5% and/orby no more than 66.7%.
 9. The method according to claim 1, wherein for agroup of four n-bit data values, compression of the second subsets ofbits across the group of four n-bit data values comprises storing: fourbits comprising a second least significant bit of each of the datavalues in the group; and one bit indicative of a least significant bitfor the group of four data values.
 10. The method according to claim 9,wherein the one bit indicative of a least significant bit for the groupof four data values is generated using a Boolean expression of the leastsignificant bits of the four data values in the group.
 11. The methodaccording to claim 1, wherein the second compression module compressesthe second subset by 50%.
 12. The method according to claim 11, whereincompression of the second subset comprises, for each data value, storingthe second least significant bit.
 13. The method according to claim 11,wherein for a group of m n-bit data values comprising m second subsetsof bits, compression of the second subsets of bits comprises mapping thesecond subsets of bits collectively onto an m-bit encoding, the m-bitencoding being selected from 2^(m) m-bit encodings, the 2^(m) m-bitencodings comprising a first group of encodings comprising (2^(m)−4)m-bit encodings and a second group of encodings comprising four m-bitencodings; wherein if the selected encoding is an encoding from thefirst group of encodings then the selected encoding represents a groupof m second subsets of bits in which the second least significant bit ofeach second subset is the same as a respective bit of the m-bitencoding; and wherein if the selected encoding is an encoding from thesecond group of encodings then the selected encoding represents a groupof m second subsets of bits in which all of the second subsets of bitsin the group are equal.
 14. The method of claim 13, wherein m=4.
 15. Themethod of claim 13, wherein the selected encoding represents the msecond subsets of bits with no greater error than any of the other 2^(m)m-bit encodings would represent the m second subsets of bits.
 16. Acompression unit configured to compress an n-bit data value, thecompression unit comprising: dividing logic configured to divide the nbits of the data value into a first subset of bits and a second subsetof bits, the first subset comprising the n−2 most significant bits ofthe data value and the second subset comprising the two leastsignificant bits of the data value; a first compression moduleconfigured to implement a first compression scheme to compress the firstsubset; and a second compression module configured to implement a secondcompression scheme to compress the second subset, wherein the first andsecond compression schemes are different.
 17. The compression unitaccording to claim 16, wherein the second compression module isconfigured to, for a group of four n-bit data values, compress thesecond subsets of bits across the group of four n-bit data values bydetermining: four bits comprising a second least significant bit of eachof the data values in the group; and one bit indicative of a leastsignificant bit for the group of four data values.
 18. The compressionunit according to claim 16, wherein the second compression module isconfigured to, for a group of m n-bit data values comprising m secondsubsets of bits, compress of the second subsets of bits by mapping thesecond subsets of bits collectively onto an m-bit encoding, wherein thesecond compression module is configured to select the m-bit encodingfrom 2^(m) m-bit encodings, the 2^(m) m-bit encodings comprising a firstgroup of encodings comprising (2^(m)−4) m-bit encodings and a secondgroup of encodings comprising four m-bit encodings; wherein the secondcompression module is configured such that if the selected encoding isan encoding from the first group of encodings then the selected encodingrepresents a group of m second subsets of bits in which the second leastsignificant bit of each second subset is the same as a respective bit ofthe m-bit encoding; and wherein the second compression module isconfigured such that if the selected encoding is an encoding from thesecond group of encodings then the selected encoding represents a groupof m second subsets of bits in which all of the second subsets of bitsin the group are equal.
 19. The compression unit according to claim 16,wherein the compression unit is embodied in hardware on an integratedcircuit.
 20. A non-transitory computer readable storage medium havingstored thereon an integrated circuit definition dataset that, whenprocessed in an integrated circuit manufacturing system, configures theintegrated circuit manufacturing system to manufacture a compressionunit which is configured to compress an n-bit data value, wherein thecompression unit comprises: dividing logic configured to divide the nbits of the data value into a first subset of bits and a second subsetof bits, the first subset comprising the n−2 most significant bits ofthe data value and the second subset comprising the two leastsignificant bits of the data value; a first compression moduleconfigured to implement a first compression scheme to compress the firstsubset; and a second compression module configured to implement a secondcompression scheme to compress the second subset, wherein the first andsecond compression schemes are different.